Tagging for Learning: Collecting Thematic Relations from Corpus

نویسندگان

  • Uri Zernik
  • Paul S. Jacobs
چکیده

Recent work in text analysis has suggested that da ta on words tha t frequently occur together reveal important information about text content. Co-occurrence relations can serve two main purposes in language processing. First, the statistics of co-occurrence have been shown to produce accurate results in syntactic analysis. Second, the way that words appear together can help in assigning thematic roles in semantic interpretation. This paper discusses a method for collecting co-occurrence data, ~qu i r ing lexical relations from the data, and applying these relations to semantic analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Thematic Role Relations for Wordnets

In this paper, I present a method for learning thematic role relations (selectional preferences) for wordnets by means of statistical corpus analysis. An evaluation on a gold standard, which I extracted from EuroWordNet, shows that this method achieves a learning accuracy of up to 77%. I also propose a preprocessing step for a partial lexical disambiguation of the input data. This disambiguatio...

متن کامل

The language of collaborative tagging

Collaborative tagging is the process whereby people attach keywords, known as tags, to digital resources, such as text and images, in order to render them retrievable in the future. This thesis investigates how tags submitted by users in collaborative tagging systems function as descriptors of a resource’s perceived content. Using computational and theoretical tools, I compare collaborative tag...

متن کامل

Massively parallel learning of part-of-speech disambiguation

This paper presents a method for massively parallel learning of part-of-speech disambiguation based on a minmax modular neural network model. The method has three main steps. Firstly, a large-scale tagging problem is decomposed into a number of relatively smaller and simpler subproblems according to the class relations among a given training corpus. Secondly, all of the subproblems are learned ...

متن کامل

Learning "Generalization/Specialization" Relations between Concepts - Application for Automatically Building Thematic Document Hierarchies

We introduce a new method for automatically constructing concept hierarchies where the concept nodes follow a generalization / specialization relation. Starting from a set of concepts automatically extracted from a corpus, we show how to learn generalization / specialization relations between couples of concepts and how this leads to the construction of the hierarchy. We present an application ...

متن کامل

Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages

The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-ofspeech tag set. We compare the performance of a combination of language specific taggers to that of applying four machine learning algorithms to the task (Condi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990